Comparison of the Effective Reproduction Number (Rt) Estimation Methods of COVID-19 Using Simulation Data Based on Available Data from Iran, USA, UK, India, and Brazil

Background: Accurate determination of the effective reproduction number (Rt) is a very important strategy in the epidemiology of contagious diseases, including coronavirus disease 2019 (COVID-19). This study compares different methods of estimating the Rt of susceptible population to identify the most accurate method for estimating Rt. Study Design: A secondary study. Methods: The value of Rt was estimated using attack rate (AR), exponential growth (EG), maximum likelihood (ML), time-dependent (TD), and sequential Bayesian (SB) methods, for Iran, the United States, the United Kingdom, India, and Brazil from June to October 2021. In order to accurately compare these methods, a simulation study was designed using forty scenarios. Results: The lowest mean square error (MSE) was observed for TD and ML methods, with 15 and 12 cases, respectively. Therefore, considering the estimated values of Rt based on the TD method, it was found that Rt values in the United Kingdom (1.33; 95% CI: 1.14-1.52) and the United States (1.25; 95% CI: 1.12-1.38) substantially have been more than those in other countries, such as Iran (1.07; 95% CI: 0.95-1.19), India (0.99; 95% CI: 0.89-1.08), and Brazil (0.98; 95% CI: 0.84-1.14) from June to October 2021. Conclusion: The important result of this study is that TD and ML methods lead to a more accurate estimation of Rt of population than other methods. Therefore, in order to monitor and determine the epidemic situation and have a more accurate prediction of the incidence rate, as well as control COVID-19 and similar diseases, the use of these two methods is suggested to more accurately estimate Rt.

methods. 16,17 The important point here is that in most studies, randomly or arbitrarily, one or more methods have been used to determine R0 and R t . [18][19][20][21][22][23] The purpose of comparing the methods studied in different countries is to check which method is known as the most accurate method in different data conditions of different countries. Another reason for this comparison is to compare the R t values of different countries to check in which countries the epidemic situation occurred between June and October 2021. Moreover, it aims to investigate whether this reduction was more noticeable in countries that had a more comprehensive vaccination program in this period.
Considering the fact that there is a relatively large difference between the estimated values of R t using these methods, as well as the difficulty of choosing the most accurate R t , one of the challenging issues in these studies is whether the chosen method was the best one, and the estimated value of R t was the most accurate. The answer to this question definitely requires a comprehensive and accurate study to use simulated data as well as real data, compare all these methods, and determine the most accurate method. Therefore, in the present study, in addition to a complete comparison of R t values of different methods in different countries, simulated data have been used to determine the best and most accurate method using different scenarios. On the other hand, by using this index and choosing the best method, as well as achieving the exact effective reproduction number, it will be possible to detect the epidemic status of this virus.

Methods
The present study used five methods, namely AR, EG, ML, TD, and SB to estimate R t . These methods are available in R0 statistical package, and to implement each of them, the required data were extracted from similar studies conducted over the same period in that country. 24 To this end, information including the time of onset, peak, and end of the epidemic was needed, which was determined using available data and sensitivity analysis. Moreover, it was necessary to accurately determine the generation time distribution, for which, according to similar studies, gamma distribution with different parameters was used. [25][26][27][28][29] In addition to this information, AR values based on similar studies were extracted for each of the studied countries and placed in the AR method. [30][31][32][33] Finally, after determining R t using default and optimal approaches in different countries, these methods were compared using simulated data based on different scenarios. In the default approach, the same period of time was considered the length of the epidemic for all countries. However, in the optimal approach, using the appropriate commands in the used package, according to the number of daily items and available data, it was possible to consider a separate epidemic length in the analysis for each country.

Data
This study used two data sets. Actual COVID-19 data from Iran, the United States, the United Kingdom, India, and Brazil were collected on a daily basis from the Worldometers site, and other data, as mentioned before, were extracted from similar studies. These data are related to the period from the beginning of June to the end of September 2021, in other words, these data are related to the time when the new Omicron variant had not been yet identified in the world. The reason for selecting these countries was to compare Iran with four countries with the highest outbreak in this four-month period. Other data are simulated data based on different scenarios described in detail below.

Statistical analysis
As mentioned before, the applied statistical models in this study are AR, EG, ML, TD, and SB models, which are available in R0 statistical package. In this study, precise programming in R software was also used to estimate R t , as well as simulate data, and then compare different models in addition to R0, EpiEstim, and EpiCurve packages.

Models
Attack rate method (AR) In equation (1), AR is the attack rate, and S 0 presents the initial percentage of the susceptible population. 18,34 Exponential growth method (EG) In equation (2), M is the Moment-generating function of generation time distribution. r is also an estimated parameter by Poisson regression. 35 Maximum likelihood method (ML) In equation (3), let N 0 , N 1 ,…,N t identify incident cases over sequential time, and w i is related to the GT distribution. μ t is also related to the Poisson distribution parameter obtained by maximizing ( ) 36,37 Time-dependent method (TD) In equation (4), R j is the effective reproduction number for the j th person obtained from Sequential Bayesian method (SB) In equation (5), N 0 , N 1 ,…,N t + 1 follows the Poisson distribution. This equation is completely different from the previous one since in order to estimate the effective reproduction number (R t ), classical inference logic is used in equations 1 to 4. In contrast, Bayesian inference is used in equation 5. In the above equation, P(R│N i ) is the posterior probability distribution, L(R;N i ) signifies the likelihood function, and P(R) presents the prior probability distribution, which is determined based on the posterior probability distribution of the previous days. Here, the value of R is estimated based on the maximum of the posterior probability distribution function. 21,39 Simulation study In order to compare the studied methods, as well as to achieve the most accurate method in estimating the effective reproduction number (R t ), we designed a simulation study based on different scenarios. In this study, in order to increase the similarity of the simulated data to the real data, the data of five countries (Iran, USA, UK, India, and Brazil) were used to design different scenarios. These scenarios were designed considering the generation time distribution (GT), as well as the distribution of new cases, according to the dispersion status.
The gamma distribution (4.55, 3.30) as the GT distribution and the epidemic interval of 60 days with a peak at day 40 were used in scenarios 1-8; the gamma distribution (4.70, 2.90) and the epidemic interval of 54 days with a peak at day 36 were used in scenarios 9-16; the gamma distribution (5.00, 2.24) and the epidemic interval of 20 days with a peak at day 10 were used in scenarios 17-24; the gamma distribution (6.00, 3.80) and the epidemic interval of 40 days with a peak at day 21 were used in scenarios 25-32; and in scenarios 33-40, the gamma distribution (3.97, 3.29) and the epidemic interval of 41 days with a peak at day 30 were utilized. In each of these 40 scenarios, the negative binomial distribution or Poisson distribution was employed as the distribution of new items, respectively. Therefore, the epidemic started with one case at time t = 0, and then, for each case, the secondary cases were generated based on these two distributions in different scenarios. Moreover, for each of these scenarios, four values were used for R t (1, 1.5, 2, and 3). In total, 10 000 epidemics with more than 50 cases were simulated for each scenario. Finally, epidemic data were collected daily, as well as cumulatively for 7 days, and the methods were compared by calculating relative bias and MSE; furthermore, a method was selected as the superior one that in addition to its low relative bias value in estimating R t , it had the lowest MSE.

Application
According to the purpose of the study, first of all, t R values were estimated for all five countries using different methods. To this end, default and optimal approaches were used. As expected, in the optimal approach, there is no significant difference in the estimated values of t R among the different methods. In this period, it can be said that the highest R t belongs to the UK and USA. The estimated R t values based on the TD method showed that the R t values for the UK (1.33; 95% CI: 1.14-1.52) and USA (1.25; 95% CI: 1.12-1.38) are substantially higher than those in other countries, such as Iran (1.07; 95% CI: 0.95-1.19), India (0.99; 95% CI: 0.89-1.08), and Brazil (0.98; 95% CI: 0.84-1.14) from June to October 2021. On the other hand, according to the estimated R t values greater than 1, it can be said that during this period, an epidemic situation has been occurred in these two countries, as well as Iran (Table 1). However, there are still differences among the different methods of estimating R t , and the reader may be hesitant to choose the most accurate indicator in the R t estimation. To answer this question and choose the most accurate method, we used the simulation study, the results of which are presented in Table 2.
In addition to estimating R t over a determined period of time (beginning of June to the end of September 2021), these values were also estimated with the corresponding weekly confidence interval, and the results for all five countries are presented in Figure 1. According to the R t trend in these graphs, it can be said that in all countries, in general, there is a non-linear trend for this interval. However, according to the results of these graphs, the epidemic interval of each country can be determined approximately. Figure 2 compares different methods for predicting incidence. As can be seen, the SB method, unlike other methods, underestimates or overestimates the incidence rate for all five countries. On the other hand, it can be said that despite the closeness of the estimation in the three methods, the estimated values in the TD method are much closer to the observed values. However, in order to determine the most accurate method, and whether the TD is really the best method for estimating R t , a simulation study was designed, the results of which can be seen in the next section.

Comparison of methods
Various scenarios were designed to compare the studied methods. These scenarios are designed according to the desired distributions, as well as different values of Rt. The reason for designing these scenarios is dealing with different conditions contained in the COVID-19 data for a more accurate comparison of the used methods. The results of this simulation study can be seen in Table 2. In this study, the values of MSE and relative bias were determined for each method in different scenarios, and then, the method with the lowest value of MSE was selected as the best method in estimating the R t of population. In a total of 40 scenarios, the lowest values of MSE for TD, SB, ML, EG, and AR methods were observed in 15, 6, 12, 1, and 6 cases, respectively. Therefore, considering the results of this study, it can be said that in all scenarios, there was the lowest amount of MSE in the TD method. The next method is the ML, which had the lowest MSE in a greater number of scenarios, compared to other methods.

Discussion
Comparing R t , in different countries, it can be said that this value in the studied interval in Iran, is better than the UK and the USA, while its value is worse than Brazil and India. However, given the value of R t , it can be said that during this interval, an epidemic situation has occurred in Iran, the UK, and the USA. The interesting point is that the differences among the R t s in different countries are commensurate with the situation of the COVID-19 epidemic in all these countries. In the period from June to October 2021, the highest weekly value of R t is also related to the UK. Perhaps the main reason can be attributed to the reduction of restrictions in this country, compared to other countries in this interval. On the other hand, according to the estimated values of R t in different countries in this period, it can be claimed that vaccination has a significant effect on reducing the estimated value of R t , thereby controlling the disease 40 since during this period, in some countries, one dose, and in some others, two doses of the vaccine was injected. A comparison of the studied methods showed that in general, the SB method underestimates and overestimates the incidence rate, while this is not the case with other methods, the prediction of these methods for the incidence of COVID-19 is acceptable. Therefore, it can be expected that these methods, due to their accurate and acceptable prediction, are more appropriate methods for estimating R t of susceptible populations. However, in order to give an accurate answer to the question of which method actually offers a more accurate estimate of R t than others, a simulation study has been used.
Although there are different methods to estimate R t , no specific and unique method has been identified that is superior to other methods. However, it can be said that five methods of TD, SB, ML, EG, and AR are used to estimate R t more than other methods. [41][42][43][44][45][46] Therefore, the aim of the present study was to compare these methods and identify the most accurate method to estimate the R t of population. According to the main purpose of the present study, which is to compare the studied methods in estimating the R t of population, a simulation study was designed. This study tried to design different scenarios, different aspects, and existing data, to more accurately compare the studied methods.
In this study, considering MSE values, it was found that the estimation of R t in the TD and ML methods are more accurate than that in other methods. The main reason for this superiority can be attributed to the GT distribution, because the appropriate and accurate distribution of GT is particularly important in estimating R t , and this distribution in the TD is more completely selected and used than in other methods. Other reasons for this superiority include the incidence of new cases, which in this method are considered in an epidemic situation, while this is not the case with other methods. Moreover, TD method requires fewer details of data and parameters than other methods, which is an advantage. 47 On the other hand, as it is known, the logic of ML method to estimate R t is the MLE method, which is a well-known and acceptable method with minimum bias in statistical analysis. 48 Here, according to the research results, it can be said that the SB method, unlike the AR, which is a suitable method for non-epidemic status, is a relatively acceptable method for epidemic status. In addition, the reason for the poor performance of the EG method to estimate R t can be attributed to the low data dispersion, because this method is acceptable when the data are highly dispersed. 34,35 Conclusion According to the research findings and results, it can be concluded that, in order to estimate R t , the use of the most accurate method is better than using the most common one. According to the present study, there is a relatively large difference between the R t estimates in different methods, and this difference highlights the importance of the present study in comparing these methods. Therefore, the general and important result of this study is that TD and ML methods provide a more accurate estimate of R t of susceptible population than other methods. Therefore, in order to monitor and determine the epidemic situation, as well as control COVID-19 and similar diseases, the use of these two methods to more accurately estimate R t is suggested. Moreover, TD and ML methods are more accurate than SB, EG and AR methods in predicting the incidence rate of COVID-19.

Conflict of interest
The authors declare no conflict of interest.

Ethical approval
This manuscript with the code IR.KMU.REC.1400.152 has been approved by the Ethics Committee of Kerman University of Medical Sciences, Kerman, Iran.

Funding
This study received no specific grant from any funding agency in the public, commercial, or not-for-profit Sectors.